Automatic interlinear glossing as two-level sequence classification

نویسندگان

  • Tanja Samardzic
  • Robert Schikowski
  • Sabine Stoll
چکیده

Interlinear glossing is a type of annotation of morphosyntactic categories and crosslinguistic lexical correspondences that allows linguists to analyse sentences in languages that they do not necessarily speak. Automatising this annotation is necessary in order to provide glossed corpora big enough to be used for quantitative studies. In this paper, we present experiments on the automatic glossing of Chintang. We decompose the task of glossing into steps suitable for statistical processing. We first perform grammatical glossing as standard supervised part-of-speech tagging. We then add lexical glosses from a stand-off dictionary applying context disambiguation in a similar way to word lemmatisation. We obtain the highest accuracy score of 96% for grammatical and 94% for lexi-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effects of Glossing Conventions on L2 Vocabulary Recognition and Production

To investigate the effects of different glossing conventions on vocabulary recognition and recall, 158 participants were given a pre-test to make sure that they did not have any prior knowledge of the target words. Reading passages with four different glossing conventions (interlinear, marginal, pre-text, and post-text) were given to eight groups. Four groups received interlingual glosses and f...

متن کامل

Enriching Interlinear Text using Automatically Constructed Annotators

In this paper, we will demonstrate a system that shows great promise for creating Part-of-Speech taggers for languages with little to no curated resources available, and which needs no expert involvement. Interlinear Glossed Text (IGT) is a resource which is available for over 1,000 languages as part of the Online Database of INterlinear text (ODIN) (Lewis and Xia, 2010). Using nothing more tha...

متن کامل

A Morphological Glossing Assistant

One of the tasks language documenters face is that of assigning glosses to function morphemes, including affixes. These glosses are typically used in marking up interlinear text at a morpheme level. But without a morphological parser, marking up interlinear text is tedious and error-prone. Ideally, a parser will be guided not only by the form and syntagmatic properties of morphemes, but also by...

متن کامل

Interlinear Glossing and its Role in Theoretical and Descriptive Studies of African and other Lesser–Documented Languages

In a manuscript William Labov (1987) states that although linguistics is a field with a long historical tradition and with a high degree of consensus on basic categories, it experiences a fundamental devision concerning the role that quantitative methods should play as part of the research progress. Linguists differ in the role they assign to the use of natural language examples in linguistic r...

متن کامل

Automatic Creation of Interlinear Text for Philological Purposes

Interlinear text presents a collection of interpretations of a manuscript. Whereas such a form is often compiled by a single author or a single team of scholars, we here consider automatic creation of interlinear text out of independently created linguistic resources. In terms of mathematical structures, we investigate the constraints one may want to impose on the rendering and pair-wise alignm...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015